Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.
translated by 谷歌翻译
Due to the lack of human resources for mental health support, there is an increasing demand for employing conversational agents for support. Recent work has demonstrated the effectiveness of dialogue models in providing emotional support. As previous studies have demonstrated that seekers' persona is an important factor for effective support, we investigate whether there are benefits to modeling such information in dialogue models for support. In this paper, our empirical analysis verifies that persona has an important impact on emotional support. Therefore, we propose a framework for dynamically inferring and modeling seekers' persona. We first train a model for inferring the seeker's persona from the conversation history. Accordingly, we propose PAL, a model that leverages persona information and, in conjunction with our strategy-based controllable generation method, provides personalized emotional support. Automatic and manual evaluations demonstrate that our proposed model, PAL, achieves state-of-the-art results, outperforming the baselines on the studied benchmark. Our code and data are publicly available at https://github.com/chengjl19/PAL.
translated by 谷歌翻译
Optimal Transport (OT) provides a useful geometric framework to estimate the permutation matrix under unsupervised cross-lingual word embedding (CLWE) models that pose the alignment task as a Wasserstein-Procrustes problem. However, linear programming algorithms and approximate OT solvers via Sinkhorn for computing the permutation matrix come with a significant computational burden since they scale cubically and quadratically, respectively, in the input size. This makes it slow and infeasible to compute OT distances exactly for a larger input size, resulting in a poor approximation quality of the permutation matrix and subsequently a less robust learned transfer function or mapper. This paper proposes an unsupervised projection-based CLWE model called quantized Wasserstein Procrustes (qWP). qWP relies on a quantization step of both the source and target monolingual embedding space to estimate the permutation matrix given a cheap sampling procedure. This approach substantially improves the approximation quality of empirical OT solvers given fixed computational cost. We demonstrate that qWP achieves state-of-the-art results on the Bilingual lexicon Induction (BLI) task.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Myocardial pathology segmentation (MyoPS) can be a prerequisite for the accurate diagnosis and treatment planning of myocardial infarction. However, achieving this segmentation is challenging, mainly due to the inadequate and indistinct information from an image. In this work, we develop an end-to-end deep neural network, referred to as MyoPS-Net, to flexibly combine five-sequence cardiac magnetic resonance (CMR) images for MyoPS. To extract precise and adequate information, we design an effective yet flexible architecture to extract and fuse cross-modal features. This architecture can tackle different numbers of CMR images and complex combinations of modalities, with output branches targeting specific pathologies. To impose anatomical knowledge on the segmentation results, we first propose a module to regularize myocardium consistency and localize the pathologies, and then introduce an inclusiveness loss to utilize relations between myocardial scars and edema. We evaluated the proposed MyoPS-Net on two datasets, i.e., a private one consisting of 50 paired multi-sequence CMR images and a public one from MICCAI2020 MyoPS Challenge. Experimental results showed that MyoPS-Net could achieve state-of-the-art performance in various scenarios. Note that in practical clinics, the subjects may not have full sequences, such as missing LGE CMR or mapping CMR scans. We therefore conducted extensive experiments to investigate the performance of the proposed method in dealing with such complex combinations of different CMR sequences. Results proved the superiority and generalizability of MyoPS-Net, and more importantly, indicated a practical clinical application.
translated by 谷歌翻译
关于点击率(CTR)预测的最新研究通过对更长的用户行为序列进行建模,已达到新的水平。除其他外,两阶段的方法是用于工业应用的最先进的解决方案(SOTA)。两阶段方法首先训练检索模型,以事先截断长行为序列,然后使用截短序列训练CTR模型。但是,检索模型和CTR模型是分别训练的。因此,CTR模型中检索到的子序列不准确,它降低了最终性能。在本文中,我们提出了一个端到端范式来建模长行为序列,与现有模型相比,该序列能够实现卓越的性能以及出色的成本效益。我们的贡献是三倍:首先,我们提出了一个名为ETA-NET的基于哈希的有效目标(TA)网络,以基于低成本的位置操作来启用端到端的用户行为检索。提出的ETA-NET可以通过顺序数据建模的数量级来降低标准TA的复杂性。其次,我们建议将通用系统体系结构作为一种可行的解决方案,用于在工业系统上部署ETA-NET。特别是,与SOTA两阶段方法相比,ETA-NET已部署在TAOBAO的推荐系统上,并在CTR上带来了1.8%的升降机和3.1%的升降机(GMV)。第三,我们在离线数据集和在线A/B测试上进行了广泛的实验。结果证明,在CTR预测性能和在线成本效益方面,所提出的模型大大优于现有的CTR模型。 ETA-NET现在为TAOBAO的主要流量提供服务,每天为数亿用户提供服务。
translated by 谷歌翻译
元强化学习(META-RL)是一种有前途的方法,使代理商能够快速学习新任务。但是,由于仅由奖励提供的任务信息不足,大多数元元素算法在多任任务方案中显示出较差的概括。语言条件的元RL通过匹配语言指令和代理的行为来改善概括。因此,从对称性学习是人类学习的一种重要形式,因此将对称性和语言指令结合到元素rl可以帮助提高算法的概括和学习效率。因此,我们提出了一种双MDP元提升学习方法,该方法可以通过对称数据和语言指令有效地学习新任务。我们在多个具有挑战性的操作任务中评估了我们的方法,实验结果表明我们的方法可以大大提高元强化学习的概括和效率。
translated by 谷歌翻译
低光图像增强功能是一个经典的计算机视觉问题,旨在从低光图像中恢复正常暴露图像。但是,该领域常用的卷积神经网络擅长对空间结构域中的低频局部结构特征进行取样,从而导致重建图像的纹理细节不清楚。为了减轻这个问题,我们建议使用傅立叶系数进行新的模块,该模块可以在频率阶段的语义约束下恢复高质量的纹理细节并补充空间域。此外,我们使用带有不同接收场的扩张卷积为图像空间域设计了一个简单有效的模块,以减轻频繁下采样引起的细节损失。我们将上述部分集成到端到端的双分支网络中,并设计一个新颖的损失委员会和一个自适应融合模块,以指导网络灵活地结合空间和频域特征,以产生更令人愉悦的视觉效果。最后,我们在公共基准上评估了拟议的网络。广泛的实验结果表明,我们的方法的表现优于许多现有的最先进的结果,表现出出色的性能和潜力。
translated by 谷歌翻译
聚类是基于它们的相似性对组对象的重要探索性数据分析技术。广泛使用的$ k $ -MEANS聚类方法依赖于一些距离的概念将数据划分为较少数量的组。在欧几里得空间中,$ k $ -Means的基于质心和基于距离的公式相同。在现代机器学习应用中,数据通常是作为概率分布而出现的,并且可以使用最佳运输指标来处理测量值数据。由于瓦斯坦斯坦空间的非负亚历山德罗夫曲率,巴里中心遭受了规律性和非舒适性问题。 Wasserstein Barycenters的特殊行为可能使基于质心的配方无法代表集群内的数据点,而基于距离的$ K $ -MEANS方法及其半决赛计划(SDP)可以恢复真实的方法集群标签。在聚集高斯分布的特殊情况下,我们表明SDP放松的Wasserstein $ k $ - 金钱可以实现精确的恢复,因为这些集群按照$ 2 $ - WASSERSTEIN MERTRIC进行了良好的分离。我们的仿真和真实数据示例还表明,基于距离的$ K $ -Means可以比基于标准的基于质心的$ k $ -Means获得更好的分类性能,用于聚类概率分布和图像。
translated by 谷歌翻译
最近,一些基于跨度的方法实现了联合方面态度分析的令人鼓舞的表现,该方法首先通过检测方面边界来提取方面(方面提取),然后对跨度级别的情感(情感分类)进行分类。但是,大多数现有方法要么顺序提取特定于任务的功能,导致功能交互不足,要么以并行方式编码方面功能和情感功能,这意味着除输入共享外,每个任务中的特征表示形式在很大程度上彼此独立。他们俩都忽略了方面提取和情感分类之间的内部相关性。为了解决这个问题,我们在新颖地提出了一个层次交互式网络(HI-ASA),以适当地对两个任务之间的双向交互作用,其中层次交互涉及两个步骤:浅层相互作用和深层交互。首先,我们利用交叉缝制机制选择性地将不同的特定任务特征组合为输入,以确保正确的双向相互作用。其次,将共同信息技术应用于输出层中两个任务之间的互惠学习,因此方面输入和情感输入能够通过反向传播编码其他任务的特征。在三个现实世界数据集上进行的广泛实验证明了HI-ASA优于基准。
translated by 谷歌翻译